CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration
نویسندگان
چکیده
In the absence of assigned tasks, a learning agent typically seeks to explore its environment efficiently. However, pursuit exploration will bring more safety risks. An under-explored aspect reinforcement is how achieve safe efficient when task unknown. this paper, we propose practical Constrained Entropy Maximization (CEM) algorithm solve task-agnostic problems, which naturally require finite horizon and undiscounted constraints on costs. The CEM aims learn policy that maximizes state entropy under premise safety. To avoid approximating density in complex domains, leverages k-nearest neighbor estimator evaluate efficiency exploration. terms safety, minimizes costs, adaptively trades off based current constraint satisfaction. empirical analysis shows enables acquisition environments, resulting improved performance both sample for target tasks.
منابع مشابه
Conditional entropy maximization for PET
Maximum Likelihood (ML) estimation is extensively used for estimating emission densities from clumped and incomplete nzeasurement data in Positron Emission Tomography (PEU modality. Reconstruction produced by ML-algorithm has been found noisy because it does not make use of available prior knowledge. Bayesian estimation provides such a platform for the inclusion of prior knowledge in the recons...
متن کاملMaximum Conditional Likelihood via Bound Maximization and the CEM Algorithm
We present the CEM (Conditional Expectation Maximization) algorithm as an extension of the EM (Expectation Maximization) algorithm to conditional density estimation under missing data. A bounding and maximization process is given to speci cally optimize conditional likelihood instead of the usual joint likelihood. We apply the method to conditioned mixture models and use bounding techniques to ...
متن کاملSafe exploration for reinforcement learning
In this paper we define and address the problem of safe exploration in the context of reinforcement learning. Our notion of safety is concerned with states or transitions that can lead to damage and thus must be avoided. We introduce the concepts of a safety function for determining a state’s safety degree and that of a backup policy that is able to lead the controlled system from a critical st...
متن کاملConstrained Utility Maximization for Generating Visual Skims
In this paper, we present a novel algorithm to generate visual skims, that do not contain audio, from computable scenes. Visual skims are useful for browsing digital libraries, and for on-demand summaries in set-top boxes. A computable scene is a chunk of data that exhibits consistencies with respect to chromaticity, lighting and sound. First, we define visual complexity of a shot to be its Kol...
متن کاملInterleaved Algorithms for Constrained Submodular Function Maximization
We present a combinatorial algorithm that improves the best known approximation ratio for monotone submodular maximization under a knapsack and a matroid constraint to 1−e −2 2 . This classic problem is known to be hard to approximate within factor better than 1− 1/e. We show that the algorithm can be extended to yield a ratio of 1−e −(k+1) k+1 for the problem with a single knapsack and the int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i9.26281